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Abstract. We describe the onset of condensation in the simple model for the balance between 
selection and mutation given by Kingman in terms of a scaling limit theorem. Loosely speaking, 
this shows that the wave moving towards genes of maximal fitness has the shape of a gamma 
distribution. We conjecture that this wave shape is a universal phenomenon that can also be 
, found in a variety of more complex models, well beyond the genetics context, and provide some 

■ further evidence for this. 

■ 1. Introduction and statement of the result 

In [9] Kingman proposes and analyses a simple model for the distribution of fitness in a pop- 
ulation undergoing selection and mutation. The characterisitic feature of this model is that 
the fitness of genes before and after mutation is modelled as independent, the mutation hav- 

■ ™g destroyed the biochemical 'house of cards' built up by evolution. Kingman shows that in 
. his model the distribution of the fitness in the population converges to a limiting distribution. 
I There are two phases: When mutation is favoured over selection, the limiting distribution is a 

\^ • skewed version of the fitness distribution of a mutant. But if selection is favoured over mutation, 

a condensation effect occurs, and we find that a positive proportion of the population in late 
C3 I generations has fitness very near the optimal value, leading to the emergence of an atom at the 

■ maximal fitness value in the limiting distribution. Physicists have argued that this is akin to 
the effect of Bose-Einstein condensation, in which for a dilute gas of weakly interacting bosons 
at very low temperatures a fraction of the bosons occupy the lowest possible quantum state, see 

^ ■ for example [2]. In the present paper, we focus on the Kingman model and discuss the form 

of the fitness distribution for that part of the population that eventually form the atom in the 
limiting distribution. After stating our theorem and giving a proof we will draw comparisons to 
other models in a discussion section at the end of this paper. 

Mathematically, Kingman's model consists of a sequence of probability measures on the unit 
interval [0, 1] describing the distribution of fitness values in the nth generation of a population. 
The parameters of the model are a mutant fitness distribution q on [0, 1] and some < /? < 1 
determining the relation between mutation and selection. If p„ is the fitness distribution in the 
nth generation we denote by 



■W^n = y XPn{dx) 

the mean fitness and define 

Pn+i{dx) = (1 - /3) w~^xpn{dx) + /3 q{dx). 
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Loosely speaking, a proportion 1 — /3 of the genes in the new generation are resampled from 
the existing population using their fitness as a selective criterion, and the rest have undergone 
mutation and are therefore sampled from the fitness distribution q. 

We assume throughout that the mutant fitness distribution near its tip is stochastically larger 
than the fitness distribution in the inital population, in the sense that the moments 



nin '■= / x"' pQ{dx) and /i„ := / x" q{dx) 

satisfy 



lim — - = 0. 



Under this (or, indeed, a weaker) assumption, Kingman showed that (pn) converges to a limit 
distribution p{dx), which does not depend on pQ. Moreover, p is absolutely continuous with 
respect to q if and only if 



/3 i'^-^>i. 







Otherwise, 



7(/3):=l-/3rf^>0, (1.1) 







and this is the case of interest to us. In this case the limiting distribution p{dx) still exists, but 
it has an atom at the optimal fitness 1, an effect called condensation. The limiting distribution 
does not depend on po and equals 

p{dx)=p^^ + ^{p)5i{dx). 
\ — X 

Our main result describes the dynamics of condensation in terms of a scaling limit theorem 
which zooms into the neighbourhood of the maximal fitness value and shows the shape of the 
'wave' eventually forming the condensate, see Figure 1. 




Figure 1. Schematic picture of p„. On the right the wave is a high peak with length of 
order 1/n and height of order n. By contrast, the bulk has height and length of order one. 
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Theorem 1. Suppose that the fitness distribution q satisfies 



lim^^- — ^ = 1, 1.2 



where a > 1, and that (jl.ip holds. Then, for x > 0, 



limp„,(l--,l) = ry'^-'e-ydy. (1.3) 



"too n r(a) Jq 

We remark that the total mass in the 'wave' moving towards the maximal fitness value agrees 
with the mass of the atom in the limiting distribution p{dx). Its rescaled shape is that of a 
gamma distribution with shape parameter a. 

2. Proof of Theorem 1 

Note that 

fin = j x'^ q{dx) ~ r(a + 1) n-", 
where the asymptotics is easily derived from (II. 2p . and note that 



oo 



n=0 Jo P 

Also define 

W„ := • • • Wn- 

Given the family iWn)n>i the fitness distributions can be obtained as 

n— 1 _ 

Pn{dx) = (1 - /3Yf3x^q{dx) + iSTx^Poidx), (2.2) 

see [9l (2.1)]. The main tool in the proof is therefore the following lemma. 
Lemma 2. We have, as oo, 

Wn^cn-''{l-/3r~\ 

where 

R oo 

7(/3) t{ 
Proof. Integrating ([22]) we obtain [H (2.3)] 

n-l 

Wn = Y^ Wn-r (1 " (3)'-' f3 fir + (1 " Z?)""' m„. 
r=l 

Abbreviate n„ := Ty„ (1 — /3)"^~". Then ti„ satisfies the renewal equation 

n-l 

^^n = i4g ^ Un-rfir + for 71 > 1. 
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Using (I2.ip . we obtain SJ^i = 1 — < 1. Hence, the renewal theorem, see e.g. [U 
XXXIII.IO, Theorem 1], imphes that 

n=l ^ 1-/3 2^n=lP« '^^^ n=l 

where the finiteness follows since m„ is bounded by a constant multiple of /x„ and 

n=0 

Fix 5 > and < e < < 1 and suppose n is large enough such that i]n <n — 1 and 

< (r(a + 1) +5)r"° for all r > (1 - ?7)ri. 

For an inductive argument suppose that chosen such that < r ° for all 

en < r < n — 1. Then one has for r = 1, . . . , n — 1 

'(1 -e)-"(r(a + 1) +5)n-"u,. if r < en, 

UrUn-r < { Cr (r(a + 1) + 5)r~°(n — r)~" if en < r < ryn, 

^Cr rj~°'n~°' fj,n-r if rjn < r, 

so that 



< OO. 



n 



Un<{l- (r(a + 1) + 5) ( tx,,^ 

r=l 
[rjnj 

+ A(r(a + l) + 5)(i 5: c.(^)-"(l-^)-")n-- (2.3) 

r=[enj+l 

n.-l 

+ ^ Cr.//n-r)n"" + m„ =: Cnn~°. 

r'=[77nj+l 

By induction this yields a sequence (c^) with u„, < c„ n~° for all n > 1. 

Using that rrinn'^ — > by assumption, and that the term (j2.3p is bounded by a constant multi- 



ple of 

we see that (c„) converges to the unique solution c* = c*{e,5,r]) of 

OO OO 

c* = (1 - e)-" {T{a + l) + d)Y, + c* 7?"" E 

r=l r=l 

Recalling that PYl'^^if^r = 1 — 7(/3) — and letting e,6 i and r/ t 1 we see that c*{e,5,r]) 
converges to 

OO OO 

c = r(a + 1) E = r(a + 1) E (i " Z^)'"'' 

which yields the upper bound. The lower bound can be derived similarly. □ 
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To complete the proof using the lemma, we look at (12. 2p and get 

^ ^ Jl-x/n Wn Jl-x/n 



The second term vanishes asymptotically, as 

^ (1 - /' ~ (1 - « ^■^V""""^' ^ 0, 

using our assumption that m„/^„ — )• 0. The first term is asymptotically equivalent to 

n-l „i 
n"5^Ty„_,c-i(l-/3)i-"+^/3 / fq{dy). 

r=0 Jl-x/n 



By chosing a large M, the contribution coming from terms with r < n — M'n}/'^ can be bounded 
by a constant multiple of 

which is bounded by an arbitraily small constant. For the remaining terms we can now use that 



/' y'^ q{dy) ~ f e""^ dq{l - |, 1) ~ « n"" / 

Jl-x/n Jx Jo 



" ' a^-^e-"' da. 



c/r, 

and a change of variables to obtain equivalence to 

m=l 

and the result follows as, by Lemma [21 

7(/3) 



a/3c-i( ^W^^ (1-/3)1-" 

as required. 



m=l 



r(a)' 



3. Discussion 

Kingman's model is on the one hand one of the simplest models in which a condensation effect 
can be observed, on the other hand it is sufficiently rich to study the emergence of condensation 
as a dynamical phenomenon. The simplicity of the model allows a rigorous treatment with 
elementary means, but we believe that our calculation has far reaching consequences as a variety 
of much more complex models in quite diverse areas of science have similar features. Among 
the models we expect to share many features with Kingman's model are models of the physical 
phenomenon of Bose-Einstein condensation, of wealth condensation in macroeconomics, or the 
emergence of traffic jams. 
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Our main conjecture is that in a large universality class of models in which effects similar to 
mutation and selection compete effectively on a bounded and continuous statespace, the 'wave' 
moving towards the maximal state forming the condensate is of a Gamma shape. 

Random models which are suitable test cases for our universality claim arise, for example, in 
the study of random permutations with cycle weights. Here the probability of a permutation a 
in the symmetric group on n elements is defined as 



where Rj{(y) is the number of cycles of length j in a and /i„ is a normalisation constant. For 
our investigation we focus on the case that 6j ^ p for 7 G R. We now discuss results of Betz, 
Ueltschi and Velenik [3j and Ercolani and Ueltschi in our context. 

Our interest is in the empirical cycle length distribution which is the random measure on [0, 1] 
given by 



where the integers Ai > A2 > • • • are the ordered cycle lengths of a permutation chosen randomly 
according to P„. The asymptotic behaviour of shows three phases depending on the value of 
the parameter 7, see Table 1 in l^: 

• If 7 < large cycles are preferred and the empirical cycle length distribution concentrates 
asymptotically in the point 1, 

• if 7 = there is no condensation and we have convergence to a beta distribution, 

• if 7 > we see a preference for short cycles and the empirical cycle length distribution 
concentrates asymptotically in the point 0. 

In the two phases in which see a condensation effect we have partial information on the shape 
of the wave, which is consistent with our universality claim. 

Let us first look at the case 7 > when the empirical cycle length distribution concentrates in 
the left endpoint of our domain, i.e. the normalised cycle lengths vanish asymptotically. In this 
case Theorem 5.1 of [7| shows that, for a = 



i.e. focusing on the left edge of the domain in the scale 1/n" we see a gamma distributed wave 
shape with parameter 7, at least in the mean. It is a natural conjecture that this convergence 
holds not only in expectation, but also in probability, and establishing this fact is subject of an 
ongoing project. 

If 7 < large cycles are preferred. Here the situation is slightly different because the wave 
sweeping towards the maximal normalised cyclelength is on the critical scale 1/n and this means 
that we expect that the discrete nature of /x„ is retained in the limit. 

More precisely, Theorem 3.2 of [3] implies that 





i=l 




m 




n=0 
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where c* is a 'Malthusian parameter' chosen such that 

oo 
n.=l 

We further note that K ~ C n'^"^ by [H (7.1)] and so we are stih able to recognise a discrete 
form of a gamma distribution with parameter 7 in this case. 

The most elaborate model in which we were able to test our hypothesis is a random network 
model with fitness. We now give an informal preview of forthcoming results of Dereich [5], which 
are motivated by a problem of Borgs et al. [1]. 

A preferential attachment network model is a sequence of random graphs {G{n))n£n that is built 
dynamically: one starts with a graph ^(1) consisting of a single vertex 1 and, in general, the 
graph G{n + 1) is built by adding the vertex n + 1 to the graph G{n) and by insertion of edges 
connecting the new vertex to the graph G{n) according to an attachment rule. Typically, the 
attachment rule rewards vertices that already have a high degre: in most cases the degree of a 
vertex has an affine influence on its attractiveness in the collection of new edges. In a preferential 
attachment model with fitness one additionally assigns each vertex an intrinsic fitness, a positive 
number, which has a linear impact on its attractiveness in the network formation. 

Let us be more precise about the variant of the network model to be considered in the rest of 
this paper. We consider a sequence of random directed graphs (t/(n))„,gN and denote by 

imp„(m) := indegreeg(„)(m) + 1 

the impact of the vertex m G {l,...,n} in G{n). Further, let Fi,F2,... denote a sequence 
of independent g-distributed random variables modeling the fitness of the individual vertices 
1, 2, . . .. The attachment rule is as follows: given the graph G{n) and all fitnesses, link n + 1 to 
each individual vertex m G {1, . . . , n} with an independent Poisson distributed number of edges 
with parameter 

^— Fmimp„(m), 
nZn 

where Z„ is a normalisation which depends only on Gin) and the fitnesses. Note that all links 
point from new to old vertices so that orientations can be recovered from the undirected set of 
edges. We consider two types of normalisations: 

(1) adaptive normalisation: Zn = ^ X]m=i -^mi™Pn("^) ^'^^ ^ parameter A > 0, 

(2) deterministic normalisation: (Zn) is a deterministic sequence. 

In the case of adaptive normalisation, the outdegree of ?i + 1 is Poisson distributed with param- 
eter A, even when conditioning on the graph G{n). Hence, the total number of edges is almost 
surely of order Xn so that ^ Ylm=i i™Pn("^) converges almost surely to A + 1. 
The analogue of pn is the impact measure given by 

1 " 

En = -V] imPn("^)'5F„. 

m=l 

It measures the contribution of the vertices of a particular fitness to the total impact. 

As observed in [1] and verified for a different variant of the model in [3], network models with 
fitness show a phase transition similar to Bose-Einstein condensation. The verification of this 
phase transition in the variant considered here is conducted in [6]. 
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For adaptive normalisation two regimes can be observed 
[FGR] f q{dx) > 1 + A: the fit-get-richer phase, 

[BE] J" q{dx) < 1 + A: the Bose-Einstein phase or innovation-pays- off phase. 

In the fit-get-richer phase, the random measures converge almost surely in the weak 

topology to the measure H on (0, 1] given by 

A* 

where A* € [1, oo) denotes the unique solution to 

/3^,(dx) = l + A, 

whereas, in the Bose-Einstein phase, one observes convergence to 

E{dx) = --^ q{dx) + (l + X - [ --^q{dy))Si{dx). 
1 — X \ J 1 — y / 

In order to analyse the emergence of the condensation phenomenon, we consider the preferential 
attachment model with deterministic normalisation. We assume that q is regularly varying at 1 
with representation 

q{l -h,l) = h'^iih), 

where i : [0, 1] — ?• (0, oo) is a slowly varying function. In order to replicate the Bose-Einstein phe- 
nomenon in the model with deterministic normalisation, one needs to choose (Zn) appropriately. 
For 1 < m < n, let 

[nj ^ ^ 

T[m,n] := ^ — 

k= [mj 

The Bose-Einstein phenomenon can be replicated by choosing (Zn) such that 

1 - Z„ ~ a(logn)~-^ 

and such that the limit 

a (logn)° • log(logn)° 

7:= lim -F a exp{Tlogn,n} 3.1 

n-!>oo a — 1 i({iogn) 

exists. We stress that such a normalisation can be found for various fitness distributions q and 
we refer the reader to the article [5] for the details. 

Theorem 3. Under the above assumptions, one has, for x > 0, 

lim H„(l — , 1 ) = J / y°'~^e~^ dy, in probability. 



n^co \ logn / r(a) Jo 
For any measurable set A C [0, 1] with 1 dA, one has 

lim = H(^), in probability. 

for the measure H on [0, 1] given by 

E{dx) = Y— — q{dx) -\- ^ di{dx) . 
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Remark 1. In most cases one cannot give an explicit representation for a normalisation (Zn) 
satisfying (j3.ip . On first sight, this might be suprising since the (Zn) play a role analogous to 
(Wn) in the Kingman model where the analysis is feasible. The difference of both models comes 
from the stochastic nature of the network model. In order to analyse the network model one could 
start to work with expectations resulting in a mean field model similar to the Kingman model. 
However, the expectations for H„ are dominated by configurations that are not seen in typical 
realisations: vertices of particular high fitness that are born very early contribute most although 
being not present typically. To compensate this the normalisations in the network model have to 
be slightly smaller than a mean field model would suggest. Vertices of particularly high fitness 
have an impact only with a delay. This causes the T[logn,n] term in (13. ip and makes explicit 
representations for {Zn) in many cases unfeasible. 

We conclude our discussion with the remark that the case of unbounded fitness distribution is 
also of considerable interest. In this case Park and Krug [10] have studied the analogue of 
Kingman's model and (in a particular case) observed emergence of a travelling wave of Gaussian 
shape. They also conjecture that this behaviour is of universal nature. 
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